Dataset statistics
| Number of variables | 22 |
|---|---|
| Number of observations | 929 |
| Missing cells | 9094 |
| Missing cells (%) | 44.5% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 221.5 KiB |
| Average record size in memory | 244.1 B |
Variable types
| Numeric | 10 |
|---|---|
| Unsupported | 7 |
| Categorical | 5 |
danger has constant value "1.0" | Constant |
operation_car has constant value "16.0" | Constant |
sender has constant value "0.0" | Constant |
operation_date has a high cardinality: 524 distinct values | High cardinality |
car_number is highly correlated with rodvag | High correlation |
rodvag is highly correlated with car_number | High correlation |
operation_car is highly correlated with danger and 2 other fields | High correlation |
danger is highly correlated with operation_car and 2 other fields | High correlation |
sender is highly correlated with operation_car and 2 other fields | High correlation |
adm is highly correlated with operation_car and 2 other fields | High correlation |
index_train has 929 (100.0%) missing values | Missing |
destination_esr has 417 (44.9%) missing values | Missing |
danger has 923 (99.4%) missing values | Missing |
gruz has 417 (44.9%) missing values | Missing |
loaded has 929 (100.0%) missing values | Missing |
operation_train has 929 (100.0%) missing values | Missing |
receiver has 417 (44.9%) missing values | Missing |
rod_train has 929 (100.0%) missing values | Missing |
sender has 417 (44.9%) missing values | Missing |
ssp_station_esr has 929 (100.0%) missing values | Missing |
ssp_station_id has 929 (100.0%) missing values | Missing |
weight_brutto has 929 (100.0%) missing values | Missing |
df_index has unique values | Unique |
index_train is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
loaded is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
operation_train is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
rod_train is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
ssp_station_esr is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
ssp_station_id is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
weight_brutto is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
receiver has 25 (2.7%) zeros | Zeros |
Reproduction
| Analysis started | 2021-04-16 09:39:00.438244 |
|---|---|
| Analysis finished | 2021-04-16 09:39:18.883727 |
| Duration | 18.45 seconds |
| Software version | pandas-profiling v2.11.0 |
| Download configuration | config.yaml |
| Distinct | 929 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2362461.623 |
|---|---|
| Minimum | 4470 |
| Maximum | 4184583 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.4 KiB |
Quantile statistics
| Minimum | 4470 |
|---|---|
| 5-th percentile | 185111.2 |
| Q1 | 1207323 |
| median | 2513618 |
| Q3 | 3378128 |
| 95-th percentile | 3987982.6 |
| Maximum | 4184583 |
| Range | 4180113 |
| Interquartile range (IQR) | 2170805 |
Descriptive statistics
| Standard deviation | 1219738.002 |
|---|---|
| Coefficient of variation (CV) | 0.5162996048 |
| Kurtosis | -1.039013292 |
| Mean | 2362461.623 |
| Median Absolute Deviation (MAD) | 919358 |
| Skewness | -0.4373204778 |
| Sum | 2194726848 |
| Variance | 1.487760795 × 1012 |
| Monotocity | Strictly increasing |
| Value | Count | Frequency (%) |
| 450562 | 1 | 0.1% |
| 1461588 | 1 | 0.1% |
| 3700024 | 1 | 0.1% |
| 3665210 | 1 | 0.1% |
| 3145019 | 1 | 0.1% |
| 3657024 | 1 | 0.1% |
| 3659073 | 1 | 0.1% |
| 806212 | 1 | 0.1% |
| 2983237 | 1 | 0.1% |
| 1568072 | 1 | 0.1% |
| Other values (919) | 919 |
| Value | Count | Frequency (%) |
| 4470 | 1 | |
| 7115 | 1 | |
| 74454 | 1 | |
| 75222 | 1 | |
| 75808 | 1 | |
| 76036 | 1 | |
| 87066 | 1 | |
| 87303 | 1 | |
| 88220 | 1 | |
| 88301 | 1 |
| Value | Count | Frequency (%) |
| 4184583 | 1 | |
| 4176732 | 1 | |
| 4176537 | 1 | |
| 4176399 | 1 | |
| 4175426 | 1 | |
| 4172154 | 1 | |
| 4171204 | 1 | |
| 4167738 | 1 | |
| 4166949 | 1 | |
| 4165917 | 1 |
length
Real number (ℝ≥0)
| Distinct | 8 |
|---|---|
| Distinct (%) | 0.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.053466093 |
|---|---|
| Minimum | 1 |
| Maximum | 1.36 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.4 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1.06 |
| 95-th percentile | 1.36 |
| Maximum | 1.36 |
| Range | 0.36 |
| Interquartile range (IQR) | 0.06 |
Descriptive statistics
| Standard deviation | 0.1130419856 |
|---|---|
| Coefficient of variation (CV) | 0.1073048163 |
| Kurtosis | 2.733429993 |
| Mean | 1.053466093 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.095660333 |
| Sum | 978.67 |
| Variance | 0.0127784905 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 654 | |
| 1.06 | 131 | 14.1% |
| 1.36 | 83 | 8.9% |
| 1.32 | 20 | 2.2% |
| 1.01 | 17 | 1.8% |
| 1.22 | 16 | 1.7% |
| 1.27 | 6 | 0.6% |
| 1.11 | 2 | 0.2% |
| Value | Count | Frequency (%) |
| 1 | 654 | |
| 1.01 | 17 | 1.8% |
| 1.06 | 131 | 14.1% |
| 1.11 | 2 | 0.2% |
| 1.22 | 16 | 1.7% |
| 1.27 | 6 | 0.6% |
| 1.32 | 20 | 2.2% |
| 1.36 | 83 | 8.9% |
| Value | Count | Frequency (%) |
| 1.36 | 83 | 8.9% |
| 1.32 | 20 | 2.2% |
| 1.27 | 6 | 0.6% |
| 1.22 | 16 | 1.7% |
| 1.11 | 2 | 0.2% |
| 1.06 | 131 | 14.1% |
| 1.01 | 17 | 1.8% |
| 1 | 654 |
| Distinct | 917 |
|---|---|
| Distinct (%) | 98.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 55986585.55 |
|---|---|
| Minimum | 24603474 |
| Maximum | 96736632 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.4 KiB |
Quantile statistics
| Minimum | 24603474 |
|---|---|
| 5-th percentile | 29262053.6 |
| Q1 | 52685716 |
| median | 60211778 |
| Q3 | 62407580 |
| 95-th percentile | 68061130.8 |
| Maximum | 96736632 |
| Range | 72133158 |
| Interquartile range (IQR) | 9721864 |
Descriptive statistics
| Standard deviation | 13443258.08 |
|---|---|
| Coefficient of variation (CV) | 0.2401156982 |
| Kurtosis | 1.616075088 |
| Mean | 55986585.55 |
| Median Absolute Deviation (MAD) | 3910781 |
| Skewness | -0.05559361182 |
| Sum | 5.201153798 × 1010 |
| Variance | 1.807211877 × 1014 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) |
| 43738509 | 2 | 0.2% |
| 44779866 | 2 | 0.2% |
| 43464429 | 2 | 0.2% |
| 42633453 | 2 | 0.2% |
| 42506188 | 2 | 0.2% |
| 60194784 | 2 | 0.2% |
| 42660910 | 2 | 0.2% |
| 43115856 | 2 | 0.2% |
| 60099843 | 2 | 0.2% |
| 53418380 | 2 | 0.2% |
| Other values (907) | 909 |
| Value | Count | Frequency (%) |
| 24603474 | 1 | |
| 24616872 | 1 | |
| 24622318 | 1 | |
| 26726596 | 1 | |
| 28031342 | 1 | |
| 28032357 | 1 | |
| 28033124 | 1 | |
| 28035129 | 1 | |
| 28818342 | 1 | |
| 29004496 | 1 |
| Value | Count | Frequency (%) |
| 96736632 | 1 | |
| 96736541 | 1 | |
| 96736210 | 1 | |
| 96735956 | 1 | |
| 96657093 | 1 | |
| 96655592 | 1 | |
| 96653498 | 1 | |
| 96625090 | 1 | |
| 96619598 | 1 | |
| 95418588 | 1 |
| Distinct | 82 |
|---|---|
| Distinct (%) | 16.0% |
| Missing | 417 |
| Missing (%) | 44.9% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 809242.793 |
|---|---|
| Minimum | 27802 |
| Maximum | 998100 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.4 KiB |
Quantile statistics
| Minimum | 27802 |
|---|---|
| 5-th percentile | 438603 |
| Q1 | 797002 |
| median | 852708 |
| Q3 | 932601 |
| 95-th percentile | 946801 |
| Maximum | 998100 |
| Range | 970298 |
| Interquartile range (IQR) | 135599 |
Descriptive statistics
| Standard deviation | 161289.0809 |
|---|---|
| Coefficient of variation (CV) | 0.1993086405 |
| Kurtosis | 6.174693212 |
| Mean | 809242.793 |
| Median Absolute Deviation (MAD) | 70905 |
| Skewness | -2.319456137 |
| Sum | 414332310 |
| Variance | 2.601416761 × 1010 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) |
| 852708 | 68 | 7.3% |
| 932902 | 50 | 5.4% |
| 946801 | 34 | 3.7% |
| 798005 | 33 | 3.6% |
| 932601 | 32 | 3.4% |
| 799101 | 26 | 2.8% |
| 648202 | 21 | 2.3% |
| 806708 | 14 | 1.5% |
| 814208 | 11 | 1.2% |
| 920002 | 11 | 1.2% |
| Other values (72) | 212 | |
| (Missing) | 417 |
| Value | Count | Frequency (%) |
| 27802 | 1 | 0.1% |
| 33004 | 1 | 0.1% |
| 152006 | 3 | 0.3% |
| 155004 | 1 | 0.1% |
| 183502 | 1 | 0.1% |
| 194206 | 1 | 0.1% |
| 205400 | 1 | 0.1% |
| 255409 | 1 | 0.1% |
| 288308 | 1 | 0.1% |
| 291707 | 10 |
| Value | Count | Frequency (%) |
| 998100 | 3 | 0.3% |
| 984502 | 4 | 0.4% |
| 950718 | 1 | 0.1% |
| 946801 | 34 | |
| 942105 | 2 | 0.2% |
| 940006 | 9 | 1.0% |
| 937605 | 1 | 0.1% |
| 932902 | 50 | |
| 932601 | 32 | |
| 930108 | 3 | 0.3% |
| Distinct | 3 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 55.5 KiB |
| 20.0 | |
|---|---|
| 26.0 | 3 |
| 27.0 | 2 |
Length
| Max length | 4 |
|---|---|
| Median length | 4 |
| Mean length | 4 |
| Min length | 4 |
Characters and Unicode
| Total characters | 3716 |
|---|---|
| Distinct characters | 5 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 20.0 |
|---|---|
| 2nd row | 20.0 |
| 3rd row | 20.0 |
| 4th row | 20.0 |
| 5th row | 20.0 |
| Value | Count | Frequency (%) |
| 20.0 | 924 | |
| 26.0 | 3 | 0.3% |
| 27.0 | 2 | 0.2% |
| Value | Count | Frequency (%) |
| 20.0 | 924 | |
| 26.0 | 3 | 0.3% |
| 27.0 | 2 | 0.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 1853 | |
| 2 | 929 | |
| . | 929 | |
| 6 | 3 | 0.1% |
| 7 | 2 | 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 2787 | |
| Other Punctuation | 929 | 25.0% |
Most frequent character per category
| Value | Count | Frequency (%) |
| 0 | 1853 | |
| 2 | 929 | |
| 6 | 3 | 0.1% |
| 7 | 2 | 0.1% |
| Value | Count | Frequency (%) |
| . | 929 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 3716 |
Most frequent character per script
| Value | Count | Frequency (%) |
| 0 | 1853 | |
| 2 | 929 | |
| . | 929 | |
| 6 | 3 | 0.1% |
| 7 | 2 | 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 3716 |
Most frequent character per block
| Value | Count | Frequency (%) |
| 0 | 1853 | |
| 2 | 929 | |
| . | 929 | |
| 6 | 3 | 0.1% |
| 7 | 2 | 0.1% |
| Distinct | 1 |
|---|---|
| Distinct (%) | 16.7% |
| Missing | 923 |
| Missing (%) | 99.4% |
| Memory size | 36.5 KiB |
| 1.0 |
|---|
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 18 |
|---|---|
| Distinct characters | 3 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1.0 |
|---|---|
| 2nd row | 1.0 |
| 3rd row | 1.0 |
| 4th row | 1.0 |
| 5th row | 1.0 |
| Value | Count | Frequency (%) |
| 1.0 | 6 | 0.6% |
| (Missing) | 923 |
| Value | Count | Frequency (%) |
| 1.0 | 6 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 6 | |
| . | 6 | |
| 0 | 6 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 12 | |
| Other Punctuation | 6 |
Most frequent character per category
| Value | Count | Frequency (%) |
| 1 | 6 | |
| 0 | 6 |
| Value | Count | Frequency (%) |
| . | 6 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 18 |
Most frequent character per script
| Value | Count | Frequency (%) |
| 1 | 6 | |
| . | 6 | |
| 0 | 6 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 18 |
Most frequent character per block
| Value | Count | Frequency (%) |
| 1 | 6 | |
| . | 6 | |
| 0 | 6 |
| Distinct | 49 |
|---|---|
| Distinct (%) | 9.6% |
| Missing | 417 |
| Missing (%) | 44.9% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 350966.6426 |
|---|---|
| Minimum | 141105 |
| Maximum | 731062 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.4 KiB |
Quantile statistics
| Minimum | 141105 |
|---|---|
| 5-th percentile | 241262.55 |
| Q1 | 323058 |
| median | 324169 |
| Q3 | 351306 |
| 95-th percentile | 485572 |
| Maximum | 731062 |
| Range | 589957 |
| Interquartile range (IQR) | 28248 |
Descriptive statistics
| Standard deviation | 90565.89377 |
|---|---|
| Coefficient of variation (CV) | 0.258047013 |
| Kurtosis | 5.08170106 |
| Mean | 350966.6426 |
| Median Absolute Deviation (MAD) | 27137 |
| Skewness | 1.559379617 |
| Sum | 179694921 |
| Variance | 8202181115 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) |
| 351306 | 147 | 15.8% |
| 324169 | 136 | 14.6% |
| 323024 | 37 | 4.0% |
| 473062 | 20 | 2.2% |
| 141105 | 14 | 1.5% |
| 303285 | 13 | 1.4% |
| 264220 | 12 | 1.3% |
| 302032 | 12 | 1.3% |
| 693227 | 10 | 1.1% |
| 323058 | 9 | 1.0% |
| Other values (39) | 102 | 11.0% |
| (Missing) | 417 |
| Value | Count | Frequency (%) |
| 141105 | 14 | |
| 151037 | 4 | 0.4% |
| 232164 | 5 | 0.5% |
| 241229 | 3 | 0.3% |
| 241290 | 1 | 0.1% |
| 241318 | 2 | 0.2% |
| 241356 | 3 | 0.3% |
| 264042 | 2 | 0.2% |
| 264220 | 12 | |
| 301078 | 4 | 0.4% |
| Value | Count | Frequency (%) |
| 731062 | 1 | 0.1% |
| 693227 | 10 | |
| 661048 | 1 | 0.1% |
| 634159 | 1 | 0.1% |
| 634110 | 3 | 0.3% |
| 634089 | 3 | 0.3% |
| 632242 | 1 | 0.1% |
| 632098 | 1 | 0.1% |
| 542084 | 1 | 0.1% |
| 541024 | 1 | 0.1% |
| Distinct | 1 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 55.5 KiB |
| 16.0 |
|---|
Length
| Max length | 4 |
|---|---|
| Median length | 4 |
| Mean length | 4 |
| Min length | 4 |
Characters and Unicode
| Total characters | 3716 |
|---|---|
| Distinct characters | 4 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 16.0 |
|---|---|
| 2nd row | 16.0 |
| 3rd row | 16.0 |
| 4th row | 16.0 |
| 5th row | 16.0 |
| Value | Count | Frequency (%) |
| 16.0 | 929 |
| Value | Count | Frequency (%) |
| 16.0 | 929 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 929 | |
| 6 | 929 | |
| . | 929 | |
| 0 | 929 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 2787 | |
| Other Punctuation | 929 | 25.0% |
Most frequent character per category
| Value | Count | Frequency (%) |
| 1 | 929 | |
| 6 | 929 | |
| 0 | 929 |
| Value | Count | Frequency (%) |
| . | 929 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 3716 |
Most frequent character per script
| Value | Count | Frequency (%) |
| 1 | 929 | |
| 6 | 929 | |
| . | 929 | |
| 0 | 929 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 3716 |
Most frequent character per block
| Value | Count | Frequency (%) |
| 1 | 929 | |
| 6 | 929 | |
| . | 929 | |
| 0 | 929 |
| Distinct | 524 |
|---|---|
| Distinct (%) | 56.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 69.1 KiB |
| 2020-07-17 06:30:00 | |
|---|---|
| 2020-07-06 01:20:00 | |
| 2020-07-20 12:13:00 | 59 |
| 2020-07-29 03:45:00 | 53 |
| 2020-07-27 06:37:00 | 44 |
| Other values (519) |
Length
| Max length | 19 |
|---|---|
| Median length | 19 |
| Mean length | 19 |
| Min length | 19 |
Characters and Unicode
| Total characters | 17651 |
|---|---|
| Distinct characters | 13 |
| Distinct categories | 4 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 508 ? |
|---|---|
| Unique (%) | 54.7% |
Sample
| 1st row | 2020-07-16 16:11:00 |
|---|---|
| 2nd row | 2020-07-16 11:04:00 |
| 3rd row | 2020-07-17 12:22:00 |
| 4th row | 2020-07-17 12:18:00 |
| 5th row | 2020-07-17 11:23:00 |
| Value | Count | Frequency (%) |
| 2020-07-17 06:30:00 | 66 | 7.1% |
| 2020-07-06 01:20:00 | 66 | 7.1% |
| 2020-07-20 12:13:00 | 59 | 6.4% |
| 2020-07-29 03:45:00 | 53 | 5.7% |
| 2020-07-27 06:37:00 | 44 | 4.7% |
| 2020-07-09 05:25:00 | 28 | 3.0% |
| 2020-07-11 15:40:00 | 22 | 2.4% |
| 2020-07-14 10:20:00 | 22 | 2.4% |
| 2020-07-12 02:55:00 | 17 | 1.8% |
| 2020-07-14 02:29:00 | 13 | 1.4% |
| Other values (514) | 539 |
| Value | Count | Frequency (%) |
| 2020-07-17 | 95 | 5.1% |
| 2020-07-06 | 85 | 4.6% |
| 2020-07-29 | 84 | 4.5% |
| 2020-07-20 | 82 | 4.4% |
| 2020-07-27 | 72 | 3.9% |
| 06:30:00 | 66 | 3.6% |
| 01:20:00 | 66 | 3.6% |
| 12:13:00 | 59 | 3.2% |
| 03:45:00 | 54 | 2.9% |
| 2020-07-14 | 48 | 2.6% |
| Other values (395) | 1147 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 5832 | |
| 2 | 2726 | |
| - | 1858 | 10.5% |
| : | 1858 | 10.5% |
| 7 | 1278 | 7.2% |
| 1 | 1201 | 6.8% |
| 929 | 5.3% | |
| 5 | 467 | 2.6% |
| 3 | 447 | 2.5% |
| 6 | 370 | 2.1% |
| Other values (3) | 685 | 3.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 13006 | |
| Dash Punctuation | 1858 | 10.5% |
| Other Punctuation | 1858 | 10.5% |
| Space Separator | 929 | 5.3% |
Most frequent character per category
| Value | Count | Frequency (%) |
| 0 | 5832 | |
| 2 | 2726 | |
| 7 | 1278 | 9.8% |
| 1 | 1201 | 9.2% |
| 5 | 467 | 3.6% |
| 3 | 447 | 3.4% |
| 6 | 370 | 2.8% |
| 4 | 270 | 2.1% |
| 9 | 259 | 2.0% |
| 8 | 156 | 1.2% |
| Value | Count | Frequency (%) |
| - | 1858 |
| Value | Count | Frequency (%) |
| 929 |
| Value | Count | Frequency (%) |
| : | 1858 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 17651 |
Most frequent character per script
| Value | Count | Frequency (%) |
| 0 | 5832 | |
| 2 | 2726 | |
| - | 1858 | 10.5% |
| : | 1858 | 10.5% |
| 7 | 1278 | 7.2% |
| 1 | 1201 | 6.8% |
| 929 | 5.3% | |
| 5 | 467 | 2.6% |
| 3 | 447 | 2.5% |
| 6 | 370 | 2.1% |
| Other values (3) | 685 | 3.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 17651 |
Most frequent character per block
| Value | Count | Frequency (%) |
| 0 | 5832 | |
| 2 | 2726 | |
| - | 1858 | 10.5% |
| : | 1858 | 10.5% |
| 7 | 1278 | 7.2% |
| 1 | 1201 | 6.8% |
| 929 | 5.3% | |
| 5 | 467 | 2.6% |
| 3 | 447 | 2.5% |
| 6 | 370 | 2.1% |
| Other values (3) | 685 | 3.9% |
operation_st_esr
Real number (ℝ≥0)
| Distinct | 6 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 952865.6964 |
|---|---|
| Minimum | 936903 |
| Maximum | 989309 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.4 KiB |
Quantile statistics
| Minimum | 936903 |
|---|---|
| 5-th percentile | 946801 |
| Q1 | 946801 |
| median | 947005 |
| Q3 | 947005 |
| 95-th percentile | 989205 |
| Maximum | 989309 |
| Range | 52406 |
| Interquartile range (IQR) | 204 |
Descriptive statistics
| Standard deviation | 14016.8233 |
|---|---|
| Coefficient of variation (CV) | 0.01471017726 |
| Kurtosis | 2.034245319 |
| Mean | 952865.6964 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 1.976797908 |
| Sum | 885212232 |
| Variance | 196471335.5 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) |
| 947005 | 508 | |
| 946801 | 276 | |
| 989205 | 81 | 8.7% |
| 979608 | 59 | 6.4% |
| 989309 | 4 | 0.4% |
| 936903 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 936903 | 1 | 0.1% |
| 946801 | 276 | |
| 947005 | 508 | |
| 979608 | 59 | 6.4% |
| 989205 | 81 | 8.7% |
| 989309 | 4 | 0.4% |
| Value | Count | Frequency (%) |
| 989309 | 4 | 0.4% |
| 989205 | 81 | 8.7% |
| 979608 | 59 | 6.4% |
| 947005 | 508 | |
| 946801 | 276 | |
| 936903 | 1 | 0.1% |
operation_st_id
Real number (ℝ≥0)
| Distinct | 6 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2001251047 |
|---|---|
| Minimum | 2000037498 |
| Maximum | 2002026607 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.4 KiB |
Quantile statistics
| Minimum | 2000037498 |
|---|---|
| 5-th percentile | 2000037862 |
| Q1 | 2000037862 |
| median | 2002025275 |
| Q3 | 2002025275 |
| 95-th percentile | 2002026607 |
| Maximum | 2002026607 |
| Range | 1989109 |
| Interquartile range (IQR) | 1987413 |
Descriptive statistics
| Standard deviation | 969656.204 |
|---|---|
| Coefficient of variation (CV) | 0.0004845250202 |
| Kurtosis | -1.798466089 |
| Mean | 2001251047 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -0.4532211753 |
| Sum | 1.859162223 × 1012 |
| Variance | 9.402331539 × 1011 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) |
| 2002025275 | 508 | |
| 2000037862 | 276 | |
| 2000039126 | 81 | 8.7% |
| 2002026607 | 59 | 6.4% |
| 2000039132 | 4 | 0.4% |
| 2000037498 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 2000037498 | 1 | 0.1% |
| 2000037862 | 276 | |
| 2000039126 | 81 | 8.7% |
| 2000039132 | 4 | 0.4% |
| 2002025275 | 508 | |
| 2002026607 | 59 | 6.4% |
| Value | Count | Frequency (%) |
| 2002026607 | 59 | 6.4% |
| 2002025275 | 508 | |
| 2000039132 | 4 | 0.4% |
| 2000039126 | 81 | 8.7% |
| 2000037862 | 276 | |
| 2000037498 | 1 | 0.1% |
| Distinct | 91 |
|---|---|
| Distinct (%) | 17.8% |
| Missing | 417 |
| Missing (%) | 44.9% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 35348119.17 |
|---|---|
| Minimum | 0 |
| Maximum | 97966927 |
| Zeros | 25 |
| Zeros (%) | 2.7% |
| Memory size | 7.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 186209.35 |
| Q1 | 5766801 |
| median | 37144780 |
| Q3 | 56090951 |
| 95-th percentile | 90053767 |
| Maximum | 97966927 |
| Range | 97966927 |
| Interquartile range (IQR) | 50324150 |
Descriptive statistics
| Standard deviation | 29892462.18 |
|---|---|
| Coefficient of variation (CV) | 0.8456591999 |
| Kurtosis | -0.8863102326 |
| Mean | 35348119.17 |
| Median Absolute Deviation (MAD) | 31359533 |
| Skewness | 0.515853832 |
| Sum | 1.809823702 × 1010 |
| Variance | 8.935592951 × 1014 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) |
| 37144780 | 114 | 12.3% |
| 0 | 25 | 2.7% |
| 5766801 | 21 | 2.3% |
| 77622226 | 20 | 2.2% |
| 74733014 | 19 | 2.0% |
| 68861216 | 16 | 1.7% |
| 187145 | 14 | 1.5% |
| 12631504 | 13 | 1.4% |
| 1029454 | 12 | 1.3% |
| 28545095 | 12 | 1.3% |
| Other values (81) | 246 | |
| (Missing) | 417 |
| Value | Count | Frequency (%) |
| 0 | 25 | |
| 186200 | 1 | 0.1% |
| 186217 | 1 | 0.1% |
| 186269 | 4 | 0.4% |
| 186424 | 10 | 1.1% |
| 186447 | 3 | 0.3% |
| 186465 | 1 | 0.1% |
| 186602 | 1 | 0.1% |
| 186631 | 1 | 0.1% |
| 186849 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 97966927 | 1 | 0.1% |
| 97679381 | 10 | |
| 96417242 | 2 | 0.2% |
| 95687583 | 8 | |
| 92964042 | 2 | 0.2% |
| 90710099 | 2 | 0.2% |
| 90053767 | 2 | 0.2% |
| 89706939 | 1 | 0.1% |
| 88159599 | 9 | |
| 88143598 | 2 | 0.2% |
| Distinct | 6 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 53.26264801 |
|---|---|
| Minimum | 20 |
| Maximum | 95 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.4 KiB |
Quantile statistics
| Minimum | 20 |
|---|---|
| 5-th percentile | 20 |
| Q1 | 40 |
| median | 60 |
| Q3 | 60 |
| 95-th percentile | 60 |
| Maximum | 95 |
| Range | 75 |
| Interquartile range (IQR) | 20 |
Descriptive statistics
| Standard deviation | 16.19612801 |
|---|---|
| Coefficient of variation (CV) | 0.304080413 |
| Kurtosis | 0.6747461584 |
| Mean | 53.26264801 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -0.7512602681 |
| Sum | 49481 |
| Variance | 262.3145624 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) |
| 60 | 653 | |
| 20 | 130 | 14.0% |
| 40 | 109 | 11.7% |
| 90 | 33 | 3.6% |
| 92 | 3 | 0.3% |
| 95 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 20 | 130 | 14.0% |
| 40 | 109 | 11.7% |
| 60 | 653 | |
| 90 | 33 | 3.6% |
| 92 | 3 | 0.3% |
| 95 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 95 | 1 | 0.1% |
| 92 | 3 | 0.3% |
| 90 | 33 | 3.6% |
| 60 | 653 | |
| 40 | 109 | 11.7% |
| 20 | 130 | 14.0% |
| Distinct | 1 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 417 |
| Missing (%) | 44.9% |
| Memory size | 46.4 KiB |
| 0.0 |
|---|
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 1536 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.0 |
|---|---|
| 2nd row | 0.0 |
| 3rd row | 0.0 |
| 4th row | 0.0 |
| 5th row | 0.0 |
| Value | Count | Frequency (%) |
| 0.0 | 512 | |
| (Missing) | 417 |
| Value | Count | Frequency (%) |
| 0.0 | 512 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 1024 | |
| . | 512 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1024 | |
| Other Punctuation | 512 |
Most frequent character per category
| Value | Count | Frequency (%) |
| 0 | 1024 |
| Value | Count | Frequency (%) |
| . | 512 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1536 |
Most frequent character per script
| Value | Count | Frequency (%) |
| 0 | 1024 | |
| . | 512 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1536 |
Most frequent character per block
| Value | Count | Frequency (%) |
| 0 | 1024 | |
| . | 512 |
tare_weight
Real number (ℝ≥0)
| Distinct | 52 |
|---|---|
| Distinct (%) | 5.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 241.4068891 |
|---|---|
| Minimum | 209 |
| Maximum | 272 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.4 KiB |
Quantile statistics
| Minimum | 209 |
|---|---|
| 5-th percentile | 215 |
| Q1 | 236 |
| median | 241 |
| Q3 | 245 |
| 95-th percentile | 271 |
| Maximum | 272 |
| Range | 63 |
| Interquartile range (IQR) | 9 |
Descriptive statistics
| Standard deviation | 13.70715094 |
|---|---|
| Coefficient of variation (CV) | 0.05678028076 |
| Kurtosis | 0.6753302462 |
| Mean | 241.4068891 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | 0.3438497982 |
| Sum | 224267 |
| Variance | 187.885987 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) |
| 240 | 90 | 9.7% |
| 242 | 72 | 7.8% |
| 243 | 66 | 7.1% |
| 245 | 58 | 6.2% |
| 225 | 51 | 5.5% |
| 241 | 48 | 5.2% |
| 238 | 46 | 5.0% |
| 244 | 43 | 4.6% |
| 239 | 42 | 4.5% |
| 271 | 40 | 4.3% |
| Other values (42) | 373 |
| Value | Count | Frequency (%) |
| 209 | 8 | |
| 210 | 19 | |
| 211 | 1 | 0.1% |
| 212 | 2 | 0.2% |
| 214 | 3 | 0.3% |
| 215 | 19 | |
| 217 | 2 | 0.2% |
| 220 | 3 | 0.3% |
| 221 | 1 | 0.1% |
| 222 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 272 | 15 | 1.6% |
| 271 | 40 | |
| 270 | 9 | 1.0% |
| 269 | 2 | 0.2% |
| 268 | 23 | |
| 267 | 1 | 0.1% |
| 265 | 20 | |
| 263 | 2 | 0.2% |
| 262 | 4 | 0.4% |
| 260 | 7 | 0.8% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| df_index | index_train | length | car_number | destination_esr | adm | danger | gruz | loaded | operation_car | operation_date | operation_st_esr | operation_st_id | operation_train | receiver | rodvag | rod_train | sender | ssp_station_esr | ssp_station_id | tare_weight | weight_brutto | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 4470 | NaN | 1.00 | 63142087 | 781907.0 | 20.0 | NaN | 323024.0 | NaN | 16.0 | 2020-07-16 16:11:00 | 947005.0 | 2.002025e+09 | NaN | 53384816.0 | 60.0 | NaN | 0.0 | NaN | NaN | 242.0 | NaN |
| 1 | 7115 | NaN | 1.00 | 63046585 | 924501.0 | 20.0 | NaN | 473062.0 | NaN | 16.0 | 2020-07-16 11:04:00 | 947005.0 | 2.002025e+09 | NaN | 5785247.0 | 60.0 | NaN | 0.0 | NaN | NaN | 245.0 | NaN |
| 2 | 74454 | NaN | 1.36 | 29592995 | 923000.0 | 20.0 | NaN | 632098.0 | NaN | 16.0 | 2020-07-17 12:22:00 | 947005.0 | 2.002025e+09 | NaN | 46696320.0 | 20.0 | NaN | 0.0 | NaN | NaN | 269.0 | NaN |
| 3 | 75222 | NaN | 1.36 | 29058690 | 923000.0 | 20.0 | NaN | 632242.0 | NaN | 16.0 | 2020-07-17 12:18:00 | 947005.0 | 2.002025e+09 | NaN | 46696320.0 | 20.0 | NaN | 0.0 | NaN | NaN | 271.0 | NaN |
| 4 | 75808 | NaN | 1.36 | 29200292 | 946801.0 | 20.0 | NaN | 351306.0 | NaN | 16.0 | 2020-07-17 11:23:00 | 947005.0 | 2.002025e+09 | NaN | 12631504.0 | 20.0 | NaN | 0.0 | NaN | NaN | 271.0 | NaN |
| 5 | 76036 | NaN | 1.32 | 29575693 | 850204.0 | 20.0 | NaN | 515049.0 | NaN | 16.0 | 2020-07-17 12:26:00 | 947005.0 | 2.002025e+09 | NaN | 87383366.0 | 20.0 | NaN | 0.0 | NaN | NaN | 265.0 | NaN |
| 6 | 87066 | NaN | 1.00 | 53574232 | NaN | 20.0 | NaN | NaN | NaN | 16.0 | 2020-07-17 06:30:00 | 946801.0 | 2.000038e+09 | NaN | NaN | 60.0 | NaN | NaN | NaN | NaN | 226.0 | NaN |
| 7 | 87303 | NaN | 1.00 | 53574943 | NaN | 20.0 | NaN | NaN | NaN | 16.0 | 2020-07-17 06:30:00 | 946801.0 | 2.000038e+09 | NaN | NaN | 60.0 | NaN | NaN | NaN | NaN | 228.0 | NaN |
| 8 | 88220 | NaN | 1.00 | 53456877 | NaN | 20.0 | NaN | NaN | NaN | 16.0 | 2020-07-17 06:30:00 | 946801.0 | 2.000038e+09 | NaN | NaN | 60.0 | NaN | NaN | NaN | NaN | 238.0 | NaN |
| 9 | 88301 | NaN | 1.00 | 53476941 | NaN | 20.0 | NaN | NaN | NaN | 16.0 | 2020-07-17 06:30:00 | 946801.0 | 2.000038e+09 | NaN | NaN | 60.0 | NaN | NaN | NaN | NaN | 231.0 | NaN |
Last rows
| df_index | index_train | length | car_number | destination_esr | adm | danger | gruz | loaded | operation_car | operation_date | operation_st_esr | operation_st_id | operation_train | receiver | rodvag | rod_train | sender | ssp_station_esr | ssp_station_id | tare_weight | weight_brutto | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 919 | 4165917 | NaN | 1.0 | 61282562 | 288308.0 | 20.0 | NaN | 351306.0 | NaN | 16.0 | 2020-07-16 15:24:00 | 947005.0 | 2.002025e+09 | NaN | 81033642.0 | 60.0 | NaN | 0.0 | NaN | NaN | 244.0 | NaN |
| 920 | 4166949 | NaN | 1.0 | 61369989 | 920002.0 | 20.0 | NaN | 473081.0 | NaN | 16.0 | 2020-07-16 10:29:00 | 947005.0 | 2.002025e+09 | NaN | 97679381.0 | 60.0 | NaN | 0.0 | NaN | NaN | 243.0 | NaN |
| 921 | 4167738 | NaN | 1.0 | 61091211 | 798005.0 | 20.0 | NaN | 323024.0 | NaN | 16.0 | 2020-07-16 15:37:00 | 947005.0 | 2.002025e+09 | NaN | 28545095.0 | 60.0 | NaN | 0.0 | NaN | NaN | 240.0 | NaN |
| 922 | 4171204 | NaN | 1.0 | 63463988 | 798005.0 | 20.0 | NaN | 323024.0 | NaN | 16.0 | 2020-07-16 15:43:00 | 947005.0 | 2.002025e+09 | NaN | 28545095.0 | 60.0 | NaN | 0.0 | NaN | NaN | 245.0 | NaN |
| 923 | 4172154 | NaN | 1.0 | 63525448 | 998100.0 | 20.0 | NaN | 323024.0 | NaN | 16.0 | 2020-07-16 16:03:00 | 947005.0 | 2.002025e+09 | NaN | 81622261.0 | 60.0 | NaN | 0.0 | NaN | NaN | 238.0 | NaN |
| 924 | 4175426 | NaN | 1.0 | 63820583 | 932601.0 | 20.0 | NaN | 324169.0 | NaN | 16.0 | 2020-07-15 19:00:00 | 947005.0 | 2.002025e+09 | NaN | 77622226.0 | 60.0 | NaN | 0.0 | NaN | NaN | 243.0 | NaN |
| 925 | 4176399 | NaN | 1.0 | 63982318 | 798005.0 | 20.0 | NaN | 323024.0 | NaN | 16.0 | 2020-07-16 11:37:00 | 947005.0 | 2.002025e+09 | NaN | 28545095.0 | 60.0 | NaN | 0.0 | NaN | NaN | 242.0 | NaN |
| 926 | 4176537 | NaN | 1.0 | 63970883 | 932902.0 | 20.0 | NaN | 473081.0 | NaN | 16.0 | 2020-07-16 16:45:00 | 947005.0 | 2.002025e+09 | NaN | 14462000.0 | 60.0 | NaN | 0.0 | NaN | NaN | 240.0 | NaN |
| 927 | 4176732 | NaN | 1.0 | 64037104 | 852708.0 | 20.0 | NaN | 324169.0 | NaN | 16.0 | 2020-07-15 19:16:00 | 947005.0 | 2.002025e+09 | NaN | 37144780.0 | 60.0 | NaN | 0.0 | NaN | NaN | 240.0 | NaN |
| 928 | 4184583 | NaN | 1.0 | 64910169 | 940006.0 | 20.0 | NaN | 324169.0 | NaN | 16.0 | 2020-07-15 19:05:00 | 947005.0 | 2.002025e+09 | NaN | 88159599.0 | 60.0 | NaN | 0.0 | NaN | NaN | 243.0 | NaN |